首页> 外文OA文献 >Speaker verification based on the fusion of speech acoustics and inverted articulatory signals
【2h】

Speaker verification based on the fusion of speech acoustics and inverted articulatory signals

机译:基于语音声学和反向发音信号融合的说话人验证

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

We propose apractical, feature-level and score-level fusion approach by combining acoustic and estimated articulatory information for both text independent and text dependent speaker verification. From a practical point of view, we study how to improve speaker verification performance by combining dynamic articulatory information with the conventional acoustic features. On text independent speaker verification, we find that concatenating articulatory features obtained from measured speech production data with conventional Mel-frequency cepstral coefficients (MFCCs) improves the performance dramatically. However, since directly measuring articulatory data is not feasible in many real world applications, we also experiment with estimated articulatory features obtained through acoustic-to-articulatory inversion. We explore both feature level and score level fusion methods and find that the overall system performance is significantly enhanced even with estimated articulatory features. Such a performance boost could be due to the inter-speaker variation information embedded in the estimated articulatory features. Since the dynamics of articulation contain important information, we included inverted articulatory trajectories in text dependent speaker verification. We demonstrate that the articulatory constraints introduced by inverted articulatory features help to reject wrong password trials and improve the performance after score level fusion. We evaluate the proposed methods on the X-ray Microbeam database and the RSR 2015 database, respectively, for the aforementioned two tasks. Experimental results show that we achieve more than 15% relative equal error rate reduction for both speaker verification tasks. (C) 2015 Elsevier Ltd. All rights reserved.
机译:我们通过结合声学和估计的发音信息来针对文本独立和文本相关的说话者验证提出实用,功能级别和得分级别的融合方法。从实践的角度来看,我们研究如何通过将动态发音信息与常规声学功能相结合来提高说话者验证性能。在独立于文本的说话人验证上,我们发现将从测得的语音产生数据获得的发音特征与常规的梅尔频率倒谱系数(MFCC)串联起来可以显着提高性能。但是,由于直接测量发音数据在许多实际应用中是不可行的,因此我们还尝试了通过声学-发音反演获得的估计发音特征。我们探索了特征级别和分数级别融合方法,发现即使估计了发音特征,整体系统性能也得到了显着增强。这样的性能提升可能是由于嵌入在估计的关节特征中的扬声器间变化信息所致。由于发音动态包含重要信息,因此我们在依赖文本的说话者验证中包括了反向发音轨迹。我们证明了由反向发音特征引入的发音约束有助于拒绝错误的密码尝试并提高分数级别融合后的性能。对于上述两个任务,我们分别在X射线微束数据库和RSR 2015数据库上评估了建议的方法。实验结果表明,对于两种说话人验证任务,我们都实现了15%以上的相对均等错误率降低。 (C)2015 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号